-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add missing lock to Constraint-aware append #7515
base: main
Are you sure you want to change the base?
Add missing lock to Constraint-aware append #7515
Conversation
Unfortunately, the assertion failure is not easy to reproduce in a test because it doesn't happen all the time. It typically requires a clean-slate session. Might spend some more time doing a test for it later. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #7515 +/- ##
==========================================
+ Coverage 80.06% 81.32% +1.25%
==========================================
Files 190 242 +52
Lines 37181 44891 +7710
Branches 9450 11204 +1754
==========================================
+ Hits 29770 36507 +6737
- Misses 2997 3990 +993
+ Partials 4414 4394 -20 ☔ View full report in Codecov by Sentry. |
Here's a script that reproduces the assertion crash: select setseed(0.2);
create table readings(time timestamptz, device int, temp float);
select create_hypertable('readings', 'time', create_default_indexes => false);
insert into readings
select t, ceil(random()*10), random()*40
from generate_series('2022-06-01'::timestamptz, '2022-06-20 00:01:00', '1s') t;
alter table readings set (
timescaledb.compress,
timescaledb.compress_orderby = 'time',
timescaledb.compress_segmentby = 'device'
);
--insert into readings values ('2022-06-01', 1, 1.0), ('2022-06-02', 2, 2.0), ('2022-08-02', 3, 3.0);
create index on readings (time);
select format('%I.%I', chunk_schema, chunk_name)::regclass as chunk
from timescaledb_information.chunks
where format('%I.%I', hypertable_schema, hypertable_name)::regclass = 'readings'::regclass
limit 1 \gset
set timescaledb.enable_chunk_append=off;
select compress_chunk(ch) from show_chunks('readings') ch;
select decompress_chunk('_timescaledb_internal._hyper_1_3_chunk');
--set timescaledb.enable_chunk_append=on;
explain (analyze)
select time, avg(temp), device from readings
where time > now() - interval '2 years 5 months 20 days' group by time, device order by time;
|
@akuzm After some more investigation it seems this is related to parallel query plans, where the spawned workers don't have the relation locked. I don't know if parallel workers are expected to have relations referenced in plan nodes locked at the beginning of plan execution. |
Not sure, e.g. the parallel seq scans seemingly doesn't lock tables in the workers, athough I only looked through the code and didn't check. In ChunkAppend, we ultimately switched to performing chunk exclusion in the parallel leader. Maybe it makes sense to do the same here as well. #5857 |
If we can do chunk exclusion only once it is better. But TBH, I am not sure it is worth the effort given how seldom this scan node is used, so for now I think just taking this lock in the worker is enough. We can always implement leader exclusion later. |
@akuzm Are we good to proceed with this fix. It seems the lack of locking is only in parallel queries so it seems correct to grab the lock if it is not already taken. Also seems like a very low-risk fix. |
Sure. Let's move the lock before |
01b22dd
to
bb9b0f2
Compare
Moved the lock to before constify_restrictinfos. |
bb9b0f2
to
7015f2d
Compare
During parallel scans of Constraint-aware append, it seems like runtime chunk exclusion happens in every parallel worker. However, parallel workers don't grab a lock on a relation before calling relation_excluded_by_constraints(), which makes them hit an assertion. An AccessShareLock or higher must be held on the relation. Ideally, runtime chunk exclusion should happen only once in the parallel worker "leader", but that requires a bigger refactor left for the future.
7015f2d
to
e047e43
Compare
During parallel scans of Constraint-aware append, it seems like runtime chunk exclusion happens in every parallel worker. However, parallel workers don't grab a lock on a relation before calling relation_excluded_by_constraints(), which makes them hit an assertion. An AccessShareLock or higher must be held on the relation.
Ideally, runtime chunk exclusion should happen only once in the parallel worker "leader", but that requires a bigger refactor left for the future.